Search CORE

18 research outputs found

The Stable Signature: Rooting Watermarks in Latent Diffusion Models

Author: Couairon Guillaume
Douze Matthijs
Fernandez Pierre
Furon Teddy
Jégou Hervé
Publication venue
Publication date: 27/03/2023
Field of study

Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the image generator, conditioned on a binary signature. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model. We evaluate the invisibility and robustness of the watermarks on a variety of generation tasks, showing that Stable Signature works even after the images are modified. For instance, it detects the origin of an image generated from a text prompt, then cropped to keep

10\%

of the content, with

90

\%

accuracy at a false positive rate below 10

^{-6}

.Comment: Website at https://pierrefdz.github.io/publications/stablesignatur

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Superfilamentation in air

Author: Brelet Yohann
Carbonnel Jérôme
Couairon Arnaud
Houard Aurélien
Jukna Vytautas
Liu Yi
Milián Carles
Mysyrowicz André
Point Guillaume
Publication venue: 'American Physical Society (APS)'
Publication date: 04/06/2014
Field of study

The interaction between a large number of laser filaments brought together using weak external focusing leads to the emergence of few filamentary structures reminiscent of standard filaments, but carrying a higher intensity. The resulting plasma is measured to be one order of magnitude denser than for short-scale filaments. This new propagation regime is dubbed superfilamentation. Numerical simulations of a nonlinear envelope equation provide good agreement with experiments.Comment: 5 pages, 4 figure

arXiv.org e-Print Archive

HAL-Polytechnique

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Author: Cord Matthieu
Couairon Guillaume
Dancette Corentin
Gaya Jean-Baptiste
Ramé Alexandre
Shukor Mustafa
Soulier Laure
Publication venue
Publication date: 16/10/2023
Field of study

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate the issue. This paper proposes embracing the heterogeneity of diverse rewards by following a multi-policy strategy. Rather than focusing on a single a priori reward, we aim for Pareto-optimal generalization across the entire space of preferences. To this end, we propose rewarded soup, first specializing multiple networks independently (one for each proxy reward) and then interpolating their weights linearly. This succeeds empirically because we show that the weights remain linearly connected when fine-tuned on diverse rewards from a shared pre-trained initialization. We demonstrate the effectiveness of our approach for text-to-text (summarization, Q&A, helpful assistant, review), text-image (image captioning, text-to-image generation, visual grounding, VQA), and control (locomotion) tasks. We hope to enhance the alignment of deep models, and how they interact with the world in all its diversity

arXiv.org e-Print Archive

Édition sémantique d’images à partir de requêtes textuelles

Author: Couairon Guillaume
Publication venue
Publication date: 06/07/2023
Field of study

L’objectif de cette thèse est de proposer des algorithmes pour la tâche d’édition d’images basée sur le texte (TIE), qui consiste à éditer des images numériques selon une instruction formulée en langage naturel. Par exemple, étant donné une image d’un chien et la requête "Changez le chien en un chat", nous voulons produire une nouvelle image où le chien a été remplacé par un chat, en gardant tous les autres aspects de l’image inchangés (couleur et pose de l’animal, arrière- plan). L’objectif de l’étoile du nord est de permettre à tout un chacun de modifier ses images en utilisant uniquement des requêtes en langage naturel. Une des spécificités de l’édition d’images basée sur du texte est qu’il n’y a pratiquement pas de données d’entraînement pour former un algorithme supervisé. Dans cette thèse, nous proposons différentes solutions pour l’édition d’images, basées sur l’adaptation de grands modèles multimodaux entraînés sur d’énormes ensembles de données. Nous étudions tout d’abord une configuration d’édition simplifiée, appelée édition d’image basée sur la recherche, qui ne nécessite pas de modifier directement l’image d’entrée. Au lieu de cela, étant donné l’image et la requête de modification, nous recherchons dans une grande base de données une image qui correspond à la modification demandée. Nous nous appuyons sur des modèles multimodaux d’alignement image/texte entraînés sur des ensembles de données à l’échelle du web (comme CLIP) pour effectuer de telles transformations sans aucun exemple. Nous proposons également le cadre SIMAT pour évaluer l’édition d’images basée sur la recherche. Nous étudions ensuite comment modifier directement l’image d’entrée. Nous proposons FlexIT, une méthode qui modifie itérativement l’image d’entrée jus- qu’à ce qu’elle satisfasse un "objectif d’édition" abstrait défini dans un espace d’intégration multimodal. Nous introduisons des termes de régularisation pour imposer des transformations réalistes. Ensuite, nous nous concentrons sur les modèles de diffusion, qui sont des modèles génératifs puissants capables de synthétiser de nouvelles images conditionnées par une grande variété d’invites textuelles. Nous démontrons leur polyvalence en proposant DiffEdit, un algorithme qui adapte les modèles de diffusion pour l’édition d’images sans réglage fin. Nous proposons une stratégie "zero-shot" pour trouver automatiquement où l’image initiale doit être modifiée pour satisfaire la requête de transformation de texte.The aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models

Theses.fr

Édition sémantique d’images à partir de requêtes textuelles

Author: Couairon Guillaume
Publication venue: HAL CCSD
Publication date: 06/07/2023
Field of study

The aim of this thesis is to propose algorithms for the task of Text-based Image Editing (TIE), which consists in editing digital images according to an instruction formulated in natural language. For instance, given an image of a dog, and the query "Change the dog into a cat", we want to produce a novel image where the dog has been replaced by a cat, keeping all other image aspects unchanged (animal color and pose, background). The north-star goal is to enable anyone to edit their images using only queries in natural language. One specificity of text-based image editing is that there is practically no training data to train a supervised algorithm. In this thesis, we propose different solutions for editing images, based on the adaptation of large multimodal models trained on huge datasets. We first study a simplified editing setup, named Retrieval-based image edit- ing, which does not require to directly modify the input image. Instead, given the image and modification query, we search in a large database an image that corresponds to the requested edit. We leverage multimodal image/text alignment models trained on web-scale datasets (like CLIP) to perform such transformations without any examples. We also propose the SIMAT framework for evaluating retrieval-based image editing. We then study how to directly modify the input image. We propose FlexIT, a method which iteratively changes the input image until it satisfies an abstract "editing objective" defined in a multimodal embedding space. We introduce a variety of regularization terms to enforce realistic transformations. Next, we focus on diffusion models, which are powerful generative models able to synthetize novel images conditioned on a wide variety of textual prompts. We demonstrate their versatility by proposing DiffEdit, an algorithm which adapts diffusion models for image editing without finetuning. We propose a zero-shot strategy for finding automatically where the initial image should be changed to satisfy the text transformation query. Finally, we study a specific challenge useful in the context of image editing: how to synthetize a novel image by giving as constraint a spatial layout of objects with textual descriptions, a task which is known as Semantic Image Synthesis. We adopt the same strategy, consisting in adapting diffusion models to solve the task without any example. We propose the ZestGuide algorithm, which leverages the spatio-semantic information encoded in the attention layers of diffusion models.L’objectif de cette thèse est de proposer des algorithmes pour la tâche d’édition d’images basée sur le texte (TIE), qui consiste à éditer des images numériques selon une instruction formulée en langage naturel. Par exemple, étant donné une image d’un chien et la requête "Changez le chien en un chat", nous voulons produire une nouvelle image où le chien a été remplacé par un chat, en gardant tous les autres aspects de l’image inchangés (couleur et pose de l’animal, arrière- plan). L’objectif de l’étoile du nord est de permettre à tout un chacun de modifier ses images en utilisant uniquement des requêtes en langage naturel. Une des spécificités de l’édition d’images basée sur du texte est qu’il n’y a pratiquement pas de données d’entraînement pour former un algorithme supervisé. Dans cette thèse, nous proposons différentes solutions pour l’édition d’images, basées sur l’adaptation de grands modèles multimodaux entraînés sur d’énormes ensembles de données. Nous étudions tout d’abord une configuration d’édition simplifiée, appelée édition d’image basée sur la recherche, qui ne nécessite pas de modifier directement l’image d’entrée. Au lieu de cela, étant donné l’image et la requête de modification, nous recherchons dans une grande base de données une image qui correspond à la modification demandée. Nous nous appuyons sur des modèles multimodaux d’alignement image/texte entraînés sur des ensembles de données à l’échelle du web (comme CLIP) pour effectuer de telles transformations sans aucun exemple. Nous proposons également le cadre SIMAT pour évaluer l’édition d’images basée sur la recherche. Nous étudions ensuite comment modifier directement l’image d’entrée. Nous proposons FlexIT, une méthode qui modifie itérativement l’image d’entrée jus- qu’à ce qu’elle satisfasse un "objectif d’édition" abstrait défini dans un espace d’intégration multimodal. Nous introduisons des termes de régularisation pour imposer des transformations réalistes. Ensuite, nous nous concentrons sur les modèles de diffusion, qui sont des modèles génératifs puissants capables de synthétiser de nouvelles images conditionnées par une grande variété d’invites textuelles. Nous démontrons leur polyvalence en proposant DiffEdit, un algorithme qui adapte les modèles de diffusion pour l’édition d’images sans réglage fin. Nous proposons une stratégie "zero-shot" pour trouver automatiquement où l’image initiale doit être modifiée pour satisfaire la requête de transformation de texte

Thèses en Ligne

Functional invariants to watermark large transformers

Author: Couairon Guillaume
Douze Matthijs
Fernandez Pierre
Furon Teddy
Publication venue: HAL CCSD
Publication date: 14/04/2024
Field of study

International audienceThe rapid growth of transformer-based models increases the concerns about their integrity and ownership insurance. Watermarking addresses this issue by embedding a unique identifier into the model, while preserving its performance. However, most existing approaches require to optimize the weights to imprint the watermark signal, which is not suitable at scale due to the computational cost. This paper explores watermarks with virtually no computational cost, applicable to a non-blind white-box setting (assuming access to both the original and watermarked networks). They generate functionally equivalent copies by leveraging the models’ invariance, via operations like dimension permutations or scaling/unscaling. This enables to watermark models without any change in their outputs and remains stealthy. Experiments demonstrate the effectiveness of the approach and its robustness against various model transformations (fine-tuning, quantization, pruning), making it a practical solution to protect the integrity of large models

HAL-CentraleSupelec

Generation of long-lived underdense channels using femtosecond filamentation in air

Author: Couairon Arnaud
Houard Aurélien
Milián Carles
Mysyrowicz André
Point Guillaume
Publication venue: 'IOP Publishing'
Publication date: 17/12/2014
Field of study

International audienceUsing femtosecond laser pulses at 800 and 400 nm, we characterize the formation of underdense channels in air generated by laser filamentation at the millijoule energy level by means of transverse interferometry. We find that using tight focusing conditions, filamentation generates a shock wave and that the resulting low-density channel lasts for more than 90 ms. Comparison of these results with hydrodynamic simulations using an Eulerian hydrodynamic code gives an good agreement and allows us to estimate the initial gas peak temperature at ∼ 1000 K. The influence of experimental parameters such as the focusing conditions for the ultrashort laser pulse, its polarization or the wavelength is studied and linked to previous characterizations of filamentation-generated plasma columns

arXiv.org e-Print Archive

HAL-Polytechnique